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[57] ABSTRACT 

An arbitration scheme for providing deterministic band- 
width and delay guarantees in an input-buffered crossbar 
switch with speedup S is presented. Within the framework of 
a crossbar architecture having a plurality of input channels 
and output channels, the arbitration scheme determines the 
sequence of fixed-size packet (or cell) transmissions 
between the inputs channels and outputs channels satisfying 
the constraint that only one cell can leave an input channel 
and enter an output channel per phase in such a way that the 
arbitration delay is bounded for each cell awaiting transmis- 
sion at the input channel. If the fixed-sized packets result 
from fragmentation of variable size packets, the scheduling 
and arbitration scheme determines deterministic delay guar- 
antees to the initial variable size packets (re-assembled at the 
output channel) as well. 
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METHOD FOR PROVIDING BANDWIDTH 
AND DELAY GUARANTEES IN A CROSSBAR 
SWITCH WITH SPEEDUP 

FIELD OF THE INVENTION 

The present invention relates generally to variable and 
fixed size packet switches, and more particularly, to an 
apparatus and method for scheduling packet inputs through 
such packet switches. 

BACKGROUND OF THE INVENTION 

In the field of Integrated Services Networks, the impor- 
tance of maintaining Quality of Service (QoS) guarantees 
for individual traffic streams (or flows) is generally recog- 
nized. Thus, such capability continues to be the subject of 
much research and development. Of particular interest for a 
system providing guaranteed flows are the guarantees asso- 
ciated with bandwidth and delay properties. These guaran- 
tees must be provided to all flows abiding to their service 
contract negotiated at connection setup, even in the presence 
of other potentially misbehaved flows. Many different meth- 
ods have been developed to provide such guarantees in 
non-blocking switch architectures such as output buffered or 
shared memory switches. Several algorithms providing a 
wide range of delay guarantees for non-blocking architec- 
tures have been disclosed in the literature. See, for example, 
A. Parekh, "A Generalized Processor Sharing Approach to 
Flow Control in Integrated Services Networks", MIT, Ph.D 
dissertation, June 1994; J. Bennett and H. Zhang, "WF2Q — 
Worst-case Fair Weighted Fair Queueing", Proc. IEEE 
INFOCOM'96; D. Stiliadis and A. Varma, "Frame-Based 
Fair Queuing: A New Traffic Scheduling Algorithm for 
Packet Switch Networks", Proc. IEEE INFOCOM '96; L. 
Zhang, "A New Architecture for Packet Switched Network 
Protocols," Massachusetts Institute of Technology, Ph.D 
Dissertatation, July 1989; A. Charny, "Hierarchical Relative 
Error Scheduler: An Efficient Traffic Shaper for Packet 
Switching Networks," Proc. NOSSDAV '97, May 1997, pp. 
283-294; and others. Schedulers capable of providing band- 
width and delay guarantees in non-blocking architectures are 
commonly referred to as "QoS-capable schedulers". 

Typically, output-buffered or shared memory architec- 
tures require the existence of high-speed memory. For 
example, an output-buffered switch requires that the speed 
of memory at each output must be equal to the total speed 
of all inputs. Unfortunately, the rate of the increase in 
memory speed available with current technology has not 
kept pace with the rapid growth in demand for providing 
large-scale integrated services networks. Because there is a 
growing demand for large switches with total input capacity 
of the order of tens and hundreds of Gb/s, building an output 
buffered switch at this speed has become a daunting task 
given the present state of technology. Similar issues arise 
with shared memory switches as well. 

As a result, many industrial and research architectures 
have adopted a more scalable approach, for example, cross- 
bars. Details of such architectures may be had with reference 
to the following papers: T. Anderson, S. Owicki, J. Saxe, C. 
Thacker, "High Speed Switch Scheduling for Local Area 
Networks", Proc. Fifth Internt. Conf. on Architectural Sup- 
port for Programming Languages and Operating Systems," 
October 1992, pp. 98-110; and N. McKeown, M. Izzard, A. 
Mekkittikul, W. Ellersick and M. Horowitz, "The Tiny Tera: 
A Packet Switch Core." Even given the advances in the art, 
providing bandwidth and delays in an input -queued crossbar 
switch remains a significant challenge. 
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A paper by N. McKeown, V. Anatharam and J. Warland, 
entitled "Achieving 100% Throughput in an Input- Queued 
Switch," Proc. IEEE INFOCOM '96, March 1996, pp. 
296-302, describes several algorithms based on weighted 

5 maximum bipartite matching (defined therein) and capable 
of providing 100% throughput in an input-buffered switch. 
Unfortunately, the complexity of these algorithms is viewed 
as too high to be realistic for high-speed hardware imple- 
mentations. In addition, the nature of the delay guarantees 

]0 provided by these algorithms remains largely unknown. 
Published research by D. Stiliadis and A. Varma, entitled 
"Providing Bandwidth Guarantees in an Input-Buffered 
Crossbar Switch," Proc. IEEE INFOCOM *95, April 1995, 
pp. 960-968, suggests that bandwidth guarantees in an input 

15 buffered crossbar switch may be realized using an algorithm 
referred to as Weighted Probabilistic Iterative Matching 
(WPIM), which is essentially a weighted version of the 
algorithm described in Anderson et al. Although the WPIM 
algorithm is more suitable for hardware implementations 

20 than that described by McKeown et al., it does not appear to 
provide bandwidth guarantees. 

One prior method of providing bandwidth and delay 
guarantees in an input-buffered crossbar architecture uses 
statically computed schedule tables (an example of which is 

25 described in Anderson et al.); however, there are several 
significant limitations associated with this approach. First, 
the computation of schedule tables is extremely complex 
and time-consuming. Therefore, it can only be performed at 
connection setup-time. Adding a new flow or changing the 

30 rates of the existing flows is quite difficult and time- 
consuming, since such modification can require 
re-computation of the whole table. Without such 
re-computation, it is frequently impossible to provide delay 
and even bandwidth guarantees even for a feasible rate 

35 assignment. Consequently, these table updates tend to be 
performed less frequently than may be desired. Second, 
per-packet delay guarantees of the existing flows can be 
temporarily violated due to such re-computation. Third, 
there exists the necessity to constrain the supported rates to 

40 a rather coarse rate granularity and to restrict the smallest 
supported rate in order to limit the size of the schedule table. 
All of these limitations serve to substantially reduce the 
flexibility of providing QoS. 

At this time, no other algorithms for providing bandwidth 

45 and delay guarantees in input-buffered crossbars are known 
to the inventors hereof. Hie search for scaleable solutions 
which can provide QoS guarantees has led to several notable 
advances in the art. In one approach, an algorithm allows for 
the emulation of a non-blocking output-buffered switch with 

50 an output FIFO queue by using an input-buffered crossbar 
with speedup independent of the size of the switch. See B. 
Prabhakar and N. McKeown, "On the Speedup Required for 
Combined Input and Output Queued Switching," Computer 
Systems Lab. Technical Report CSL-TR-97-738, Stanford 

55 University. More specifically, this reference proves that such 
emulation is possible with a speedup of 4 and conjectures 
that a speedup of 2 may suffice. This result is quite 
important, as it allows one to emulate a particular instan- 
tiation of a non-blocking output-buffered architecture with- 

60 out having to use the speedup of the order of the switch size 
(i.e., speedup equal to the number of ports). However, this 
algorithm is only capable of a very limited emulation of an 
output buffered switch with FIFO service. Furthermore, as 
described in the above-referenced technical report, such 

65 emulation does not provide any delay guarantees. Its capa- 
bility of providing bandwidth guarantees over a large time 
scale is limited to flows which are already shaped according 
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to their rate at the input to the switch, and no bandwidth cell. There is further included an arbiter, responsive to the 

guarantees can be provided in the presence of misbehaved scheduling of each per-output-channel queue by the rate 

flows. controller to which it corresponds, for controlling the pro- 

It should be noted that in speeded-up input buffered cessing of the queued cells in the scheduled queues through 
architectures the instantaneous rate of data entering an s the switch from the input channels to the output channels at 
output channel may exceed the channel capacity. Therefore, a speedup S equal to a number of phases per cell slot. The 
buffering is required not only at the inputs, but also at the arbiter, which associates with each scheduled queue a times- 
outputs. Therefore, input-buffered crossbar switches with tamp equal to the time at which such queue was scheduled, 
speedup are also known as combined input/output buffered uses a specific maximal match computation to choose one of 
switches. Hereinafter, the more conventional term "speeded- 10 the scheduled per-output-channel queues from which a cell 
up input-buffered crossbar" shall be used. may be transmitted in each phase. 

Another published study of speeded-up input buffered In a narrower sense, each rate scheduler runs a Rate- 
switches suggests that inputbuffered switches with even controlled Smallest Eligible Finish Time First (RSEFTF) 
small values of speedup may be capable of providing delays algorithm. For each per-output-channel queue, the rate 
comparable to those of output-buffered switches, but is silent 15 scheduler maintains a first and a second state variable, the 
as to the kind (if any) of worst case guarantees provided in first state variable corresponding to an ideal beginning time 
the framework described therein. See R. Guerin and K. of the next cell of the per-output-channel queue and the 
Sivarajan, "Delay and Throughput Performance of Speeded- second state variable corresponding to an ideal finishing 
up Input-Queuing Packet Switches" IBM Research Report time of transmission of the next cell of the per-output- 
RC 20892, June 1997. 20 channel queue. The rate controller selects as eligible all 

Thus, there exists a present need in the art to provide per-output-channel queues having an ideal beginning time 

deterministic delay and bandwidth guarantees while utiliz- which 15 icss than or e 1 ual to a chanQel clock counter 

ing the scalability of a crossbar architecture with speedup. value - 11 then chooses as scheduled the eligible queue having 

the smallest finish time and, for the chosen eligible queue, 

SUMMARY OF THE INVENTION 25 updates the first state variable with the ideal finish time and 

Accordingly, it is an object of the present invention to second state variable with the ideal beginning time plus one 

provide deterministic delay and bandwidth guarantees in an divided b y the rate of the <l ueue * 

input-buffered switch with speedup. According to another narrower aspect of the present 

It is yet another object of the present invention to ensure 30 invention, the specific maximal match computation per- 

the bandwidth and delay guarantees to all flows indepen- formed bv the arblter 1S as follows - First > there 1S provided 

dently of the behavior of other flows. a Set„Match set and Set__Queues set, the former being 

It is still et another ob ect of the resent invention to initialized to an empty set and the latter to the set or. the 

j.j * n u • i a a a™ associated timestamps. Next, the arbiter selects the smallest 

accommodate dynamically changing load and now compo- • * j . 4 j • o * ^ 

,., ' , l . . t n • i of the associated timestamps stored in Set_CJueues, co nse- 

sition while operating at high speed, as well as avoid the 35 . 

c ^ ■ 1 * • *• _4 a * quently adding the selected associated timestamp to Set__ 

imposition of artificial restrictions on supported rates. ~* j , J . b . . , , . , . A . f c 

f ., , filL Match and removing the selected associated timestamp trom 

In accordance wtth the purposes of the present mvention, Set _ Queues m r ° maining assoc iated timestamps associ- 

as embodied and described herein, the above and other ^ ^ per . output . channel queues correS ponding to either 

purposes are attained by an apparatus and corresponding ^ ^ cham6l Qf chanael as ^ 

method for providing bandwidth and delay guarantees in an <o associated timestam p m then delel ed from Set_Queues. If 

input-buffered switch with speed-up having input channels ^ Set Queu6S xt k t the arbiter S6nds m iDdication 

and output channels for transfernng cells therebetween. In a rf ^ peMUtput . chanael queues corresponding to the 

broad sense, the apparatus includes a set of flow queues timcsta m ^ Set _M atch S6t to lhe mput to 

correspondmg to individual flows and a set of peroutput- which fc [f ^ Set Queues set ^ no , ty> the 

channel queues in each of the input channels for buffering 45 utation returns to ^ step of a6kc& ^ 

cells awaiting transfer to the output channels. Also residing r , r . . 

in each input channel is a flow-level scheduler for schedul- ln an alternative embodiment of the present invention, the 

ing the flow queues and assigning cells for such flow queues f eue scheduling of the rate controller is a centralized 

to the appropriate per-output-channel queues. Each flow has functon. That is, the arbiter runs all of the rate controllers 

a rate assigned to it (for example, at connection setup time, 50 locaUv - 

or during renegotiation during the lifetime of a flow). Each The present invention achieves several important goals. It 

per-output-channel queue, which corresponds to a different provides deterministic delay guarantees while utilizing the 

output channel, has an assigned rate and an ideal service scalability of a crossbar architecture. It allows arbitrary 

associated therewith. The rate of a per-output-channel queue assignment of guaranteed rates (as long as the rates are 

is computed as the sum of rates of all flows destined from the 55 feasible in the sense that the sum of all rates does not exceed 

input channel to the output channel corresponding to this the total available bandwidth at any input or any output). 

q Ueue Additionally, it allows the flexibility to quickly admit new 

Also included in the apparatus of the present invention is flows and chan S e the rate assignment of existing flows, 

a rate controller corresponding to each input channel for Moreover, it provides the protection of these guarantees to 

operating on the pernoutput-channel queues. The rate con- 60 well-behaved flows even in the presence of misbehaved 

troller schedules for a given cell slot the per-output-channel flows. 

queues in the input channel to which it corresponds. The rate More specifically, it can be proved that deterministic 

controller is defined so as to guarantee to each correspond- bandwidth and delay guarantees can be obtained with 

ing per-output-channel queue an amount of actual service speedup greater than or equal to 3 in a switch with 100% 

that is within fixed bounds from the ideal service of that 65 load of all links. With no speedup or with small values of 

per-channel-output queue. The fixed bounds, which are speedup, the system is capable of providing similar deter- 

constants expressed in units of cells, each are equal to one ministic guarantees if the load due to guaranteed flows is 



01/26/2004, EAST Version: 1.4.1 



6,072,772 

5 6 

limited to a certain portion (S/3) of the bandwidth of any l^Sin. It is further assumed that the switch operates in 

link. The remaining bandwidth can be used by best-effort phases of duration T_sw defined as the time needed to 

traffic. The proof of these statements can be found in the transmit a unit of data at speed r__sw. Such phases are 

Appendix. Furthermore, it is conjectured that similar guar- referred to as matching phases. In this disclosure, a unit of 

antees can be provided with speedup of greater than or equal 5 data shall be referred to as a cell. Accordingly, a switch can 

to 2 and full load of the link, and with any speedup greater move a * most 0Qe cel1 fr0I ° each input channel and at most 

than or equal to one with load due to guaranteed flows °f cell to each output channel at each matching phase, 

limited to 50% of the link bandwidth. Therefore, on the average, a switch with speedup S can 

move S cells from each input channel and S cells to each 

While the invention is primarily related to providing ou t pu t channel. At S-n, the switch is equivalent to the output 

bandwidth and delay guarantees to flows requinng such 10 buffered switch 

guarantees, it can be used in conjunction with best-effort h ^ sh(jwn jn FIG t keU recejved Qn & 

traffic which does not require such guarantees. If best effort n m * ^ M ^ ica „ buffered a , ^ . 

traffic is present, it is assumed that the invention as described * ^ flow to ^./^ received ke(s mms?0I[i 

herein* run at an absolute priority over any scheduling b6 a fc buffer Qr ue &{ ^ { 

algorithm for best effort traffic. H chan[)el ^ „ per _ flow „ queues may be loca , ed in aQ area 

BRIEF DESCRIPTION OF THE DRAWINGS °J cen,ral memor y ™ thi ? in P ut channel Alternatively, 

flow queues may be located in a memory m the input ports 

The above objects, features and advantages of the present associated with the input channel. When the packets 

invention will become more apparent from the following 20 received from the input links are of variable length, they are 

description of the embodiments of the present invention fragmented into fixed-size cells. If the packets arriving to the 

illustrated in the accompanying drawings, wherein: switch all have a fixed length (e.g. cell in ATM networks), 

FIG. 1 is block diagram depicting an input-buffered no fragmentation is required. In packet switching networks, 

crossbar switch capable of utilizing per-output-channel where arriving packets are of different size, the implemen- 

queue scheduling and arbitration schemes in accordance 2 5 tation is free to choose the size of the cell as convenient. The 

with the present invention; tradeoff in the choice of this size is that the smaller the cell, 

PTn „ j. 'II * *■ *u «*• r *u the better delay guarantees can be provided, but the faster 

FIG. 2 is a flow diagram illustrating the actions of the . , r < • lL , JlL K 4l _ 

4 . i i , j » u the switch fabric must be (and therefore the more expensive 

input channel related to scheduling per-output-channel , s _ „ „ \ . . . . t 

r the switch). Small cell size also increases fragmentation 
Queues - 

n ' ™ overhead. Upon arrival and after possible fragmentation, 

FIG. 3 is a flow diagram depicting the arbitration policy ^ ^ mapped tQ a corresponding flow (b ased on various 

for providmg bandwidth and delay guarantees m accordance c i assifie rs: source address, destination address, protocol 

with the present invention; and typej etc ^ Qnce mapped ^ the cells are placed in the appro . 

FIG. 4 is a depiction of one example of the scheduling priate "per- flow" queue, 

policy for per-output-channel queues in the input channel 35 Associated with each flow requiring bandwidth and/or 

shown in FIG. 2. delay guarantees is some rate r_f. Typically, for guaranteed 

rate or guaranteed delay flows, the rate r__f is established at 
connection setup time (e.g., via RSVP). Rates assigned to 
guaranteed flows can also be changed during a renegotiation 

Referring to FIG. 1, with like reference numerals identi- 4 o of service parameters as allowed by the current RSVP 

fying like elements, there is shown an input-buffered cross- specification. It is assumed that the rate assignment is 

bar switch 10 implementing a crossbar arbitration scheme in feasible, i.e., the sum of the rates of all flows at each input 

accordance with the present invention. As illustrated in FIG. port does not exceed the capacity of this input port , and the 

1, the underlying architecture of the input-buffered crossbar sum of rates of all flows across all input ports destined to a 

switch 10 is represented as an nxm crossbar. Here, "n" is the 45 particular output port does not exceed the capacity of that 

number of input channels i (lii in) 12 and "m" is the output port. The feasibility of rates across all input and 

number of output channels j (l^jim) 14. Each input chan- output ports implies the feasibility of rates across all input 

nel has one or more input ports 16, each of which corre- and output channels. Included in the rate r__i guaranteed to 

sponds to a physical input link 18. Similarly, the output the flow is any overhead associated with packet fragmenta- 

channels each have one or more output ports 20, each 50 tion and re -assembly. The actual data rate negotiated at 

corresponding to a physical output link 22. The input connection setup may therefore be lower. For networks with 

channels 12 are connected to the output channels 14 by way fixed packet size, such as ATM, however, no segmentation 

of a crossbar unit 24. It will be understood by those skilled and re- assembly is required. Thus, no overhead is present, 

in the art that the crossbar unit as depicted in FIG. 1 includes As shown in FIG. 1, each input channel i 12 has m 

a crossbar switch fabric of known construction, the details of 55 per-output-channel queues 26 (also referred to as per-output 

which have been omitted for purposes of simplification. It is or virtual output queues), denoted by Q(i,j), l^j ^m, one for 

the crossbar switch fabric that is responsible for transferring each output channel j 14. In the embodiment shown in FIG. 

cells between input and output channels. 1, the input channel maintains a single flow-level scheduler 

In the embodiment shown, the total capacity of all input S_J(i) 28, which needs to schedule only a single flow per 

channels and all output channels is assumed to be the same, 60 cell time. Once scheduler S_f(i) schedules some flow f, it 

although the capacity of individual links may be different. adds the index of this flow f (or, alternatively, the head of the 

Hereinafter, the capacity of a single channel is denoted by line (HOL) cell of flow f) to the tail of queue Q(i,j). Thus, 

r„c. The speed of the switch fabric, denoted by r__sw, is depending on the implementation, Q(i j) may contain either 

assumed to be S times faster than the speed of any channel. cells or pointers to cells of individual flows. 

In general, the switch and the channel clocks are not 65 Also located at each input channel 12 is a rate controller 

assumed to be synchronized. The speedup values may be 30 (or scheduler S_q(i)), which schedules or selects for 

arbitrary (and not necessarily integer) values in the range of processing one (or more) of the virtual output queues Q(ij) 
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at every channel cell time. The channel cell time is defined 
as the time required to transmit one cell at channel speed. 
The scheduler S_q ensures that the aggregate of all flows 
destined from a given input to a given output is guaranteed 
the bandwidth r__i j equal to the sum of individual flow rate 5 
assignment over all flows destined from a given input to a 
given output, as well as per-cell delay. An example of 
scheduler S_q is described in paragraphs below. Indices of 
(or pointers to) the queues Q(i j) chosen at each cell time are 
given to an arbiter 32, which is located in the crossbar unit 
24. It is the arbiter's responsibility to determine which of the 
input channels should be able transmit a cell to particular 
output channels. It is assumed that the arbiter operates in 
phases, also referred to as "matching phases". The duration 
of each phase is equal to the duration of the channel cell slot 
divided by the speedup S. The goal of the arbiter is to 15 
compute a maximal (conflict-free) match between the input 
and output channels so that at most one cell leaves any input 
channel and at most one cell enter any output channel during 
a single matching phase. Although the term "maximal 
match" (or, alternatively, "maximal matching'*) is well 20 
understood by those skilled in the art, a definition may be 
had with reference to papers by N. McKeown et al. and 
Stiliadis et al., cited above, as well as U.S. Pat. No. 5,517, 
495 to Lund et al. 

The arbiter maintains nxm queues 34, denoted by A(i j), 25 
with each arbiter queue corresponding to a different one of 
Q(i,j). The arbiter queues 34 are used to store timestamps as 
described below. At each channel cell time the arbiter 
receives the index (or indices) of the per-output queue(s) 
Q(i j) chosen by the input scheduler S_q(i) at this channel 3Q 
slot time. When the arbiter 32 receives the index of some 
Q(ij), it adds a timestamp equal to the current time into the 
corresponding queue A(i,j). 

As explained above, during each of its matching phases, 
the arbiter decides which input can send a cell to which 35 
output by computing a maximal matching between all inputs 
and all outputs. The algorithm used to compute the maximal 
match is described in detail in paragraphs to follow. Once 
the matching is completed, the arbiter notifies each input of 
the output to which it can send a cell by sending to the input 40 
channel the index of the per-output queue from which the 
cell is to be transmitted. The input channel then picks a cell 
to send to that output channel and the cell is transmitted to 
the output channel. 

When an input channel 12 receives from the arbiter the 45 
index of the Q(i,j) corresponding to the. output channel for 
the current matching phase, it forwards the HOL cell of 
Q(ij) (or, alternatively, the cell pointed to by the HOL 
pointer in Q(ij)) to the output channel j. If Q(ij) is empty 
(that is, there is no cell of a guaranteed flow in the queue), 50 
then a cell of a lower-priority service destined to the same 
output is sent instead. If there is no best effort traffic at this 
input matching phase, then no cell is sent. The size of Q(ij) 
is determined by the properties of the schedulers S_f and 
S_q. 55 

In another variation, each input channel could maintain 
one flow-level scheduler S__(i,j) for each output. When the 
input channel i needs to transmit a cell to a given output j, 
it invokes scheduler S f(ij) to determine which flow des- 
tined to j should be chosen. Unlike the option described 60 

above, in which scheduler S f(i) can run at link speed, the 

flow-level schedulers S f(i j) must be capable of choosing 

up to S cells per cell time as it is possible that this input may 
need to send a cell to the same output in all S matching 
phases of the current cell slot. 65 

In yet another approach, the input can run m parallel S_J 
schedulers, one per output. Each of these schedulers may 
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schedule l^k^S cells per cell time. When a flow is sched- 
uled by S_f, an index to this flow is added to Q(ij). 

Although not shown in FIG. 1, a cell forwarded by an 
input channel i to an output channel j is added to a queue 
maintained by the output channel. A variety of queuing 
disciplines can be used, such as FIFO, per-input-port, or per 
flow. If the queue is not a simple FIFO, each output has an 
additional scheduler, shown in FIG. 1 as output scheduler 
S_o 36. This output scheduler determines the order in which 
cells are transmitted onto the output link from the output 
channel. It is assumed that any required reassembly occurs 
before S__o is used, so that S_o schedules packets rather 
than cells. 

Any known QoS-capable scheduler such as those men- 
tioned in the background section can be used for the sched- 
uler S_f and S_o. 

Since each scheduler S_f, S_o operates independently of 
the other, the delay of an individual cell in the switch is the 
sum of the delay of this cell under its input and output 
schedulers S__f and S_o, plus the delay due to the potential 
arbitration conflicts. The delay of a packet segmented in 
cells is comprised of the delay experienced by its last cell 
plus the segmentation and re- assembly delays. 

Still referring to FIG. 1, it can now be appreciated that, 
with respect to each input channel, S_q operates on each of 
the queues Q(i j) containing cells (or pointers to cells) which 
have already been scheduled by S„f but which have not yet 
been transmitted to their destination output channel due to 
arbitration conflicts. The present invention undertakes the 
task of determining the sequence of transmissions between 
input channels and output channels satisfying the crossbar 
constraint that only one cell can leave an input channel and 
enter an output channel per phase in such a way that the 
arbitration delay is bounded for each cell awaiting its 
transmission at the input channel. In the system as parti- 
tioned in the embodiment shown in FIG. 1, this task is 
distributed among the arbiter and the input channels. This 
task is discussed in further detail below. 

Now referring to FIG. 2, there is illustrated the actions of 
the input channel with respect to scheduling the per-output 
channel queues and the interaction with the arbiter in 
accordance with the present invention. At the initial step 42, 
the input channel initializes associated state variables and 

obtains the assigned rates r i j (again, equal to the sum of 

rates of all flows going from input i to output j) for the 
queues Q(ij). The sum of all rates is feasible in that the sum 
of rates r__ij across all inputs channels i is less than or equal 
to the channel rate r__c and the sum of rates r_J,j across all 
outputs channels j is less than or equal to r_c. The feasibility 
is ensured by admission control or by other means not 
discussed herein. In step 44 the input channel initializes its 
clock counter (denoted time) to zero. The unit of time for 
this clock is one channel cell slot. Further, in step 46, which 
is invoked each cell slot, the scheduler S_q is run to 
determine the queue Q(i,j) to be selected or scheduled during 
this time slot. The operation of scheduler S__q will be 
discussed in more detail later with reference to FIG. 4 . In 
step 48 the index of the queue selected in step 46 is passed 
to the arbiter. In step 50 the input channel checks if the 
arbiter has notified it of any of its queues Q(i,j) having been 
matched during this cell slot. If so, the HOL cell in the 
matched queue is transmitted to the corresponding output 
channel in step 52. Steps 50, 52, 54 may be repeated several 
times during one cell slot, since there may be several 
matching phases occurring during one cell slot. At the end 
of the cell slot the clock counter is incremented (step 56). To 
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accommodate potential rate changes the rates r i j are based on a Rate-controlled Smallest Eligible Finish Time 

updated if necessary as indicated in step 58. First (RSEFTF) algorithm. For each of the peroutput queues 

The operation of the arbiter 32 is now described with Q(ij), RSEFTF maintains two state variables: a first state 

reference to FIG. 3. As illustrated FIG. 3 at the initial step variable b_i,j, which is the ideal beginning time of trans- 

60, the arbiter clock is initialized to zero. The unit of time * mission of the next cell of this queue, and a second state 

for this clock is the duration of one matching phase. At the variable £J j, which is the ideal finishing time of transmis- 

beginning of any matching phase, the arbiter checks if any *™ of the next cell of this queue. Initially, b__i,j=0 and 

queue indices have been sent to it by input channels (step f-ij-l/^j initialization of these state variables 

62). If the arbiter has not received any queue indices, it o^rs in step 42 of FIG. 2). As shown in step 80 of FIG. 4, 

determines when the current matching phase is over at step io the scheduler selects all queues at this input for which b_jj 

64. Once it has been determined that the current phase is is less than or equal to the current channel cell slot time 

over, the clock counter time is incremented by one at step 66. (channel clock counter time). Such queues are called eligible 

If the arbiter has received any queue indices in the current queues. In step 82, the scheduler then chooses as scheduled 

phase, it adds a timestamp equal to the current clock counter the queue Q(ij) with the smallest finish time f_ij from 

value to the tail of the queues A(i j) corresponding to the is among the eligible queues. Ties are broken arbitrarily. The 

queues Q(i j) whose indices have been received in step 62 queue chosen in step 82 is the one whose index is sent to the 

(step 68) arbiter in step 48 of FIG. 2. In step 84, the scheduler updates 

The maximal match computation 69, which is used to ,he «ate variables b_ij andf_ij corresponding to the chosen 

generate a match, is performed according to the present <^ ucas foUows: Vr^'M*" ~? * t V f T' 

invention as shown in steps 70, 72, 74, 76, 78. A match is 20 ables b -v , J , a0 , 7? J - a 

, - j */ * * „j •, queues not chosen at the current cell slot) remain unchanged, 

defined as a conflict-free set of input/output pairs and is H 7 % 

. j. , . f t I / M Ar -\ nt It can be shown that the described embodiment satisfies 

computed based on the current contents of queues A(ij) at . _ „ _ 

, i l n * • t u * *m *u u * ■ „ several properties. Property 1 is as follows: In an nxm 

the arbiter. Beginning with step 70, the arbiter initializes a u • „ CC1 L' . , , c:> - , 

first set Set Match to an empty set and a second set cro f bar RSEFTF with integer speedup S13 and 

Set QueuestothesetofallnoD<mptyqueuesA(ij).Instep 25 matching phases synchronized with input cell imes arbi- 

72,The arbiter finds the smallest timestamp of all Head-of- 'f on dela y ^ ° ev e < " channel "P 8 ?; 

y • mnr v «. or . 0 • Co , rw«*« Th* ah ;\ More specifically, any cell scheduled at the beginning of cell 

Line (HOL) timestampsin set Set_Queues. Ine queue A(i,j) *~ / ' ' . . f 4 & 

, v . . ; u 11 * *• * f j • n •<> «aa~a slot t is transmitted before the beginning of cell slot t+n. This 

containing the smallest timestamp found in step 72 is added ♦ • . n . 

, c , * t , . , j t ,u- +a . „ „„ is true for arbitrary feasible rate assignment. Property 2 is as 

to Set_Match in step 74. In step 76, the arbiter removes 7 . 6 ricl ^™ c f J , 

from set Set Queues all queues A(i« and A(kj) where i 30 f° Uows: In " nxra crossbar nmmng RSEFTF, for arbitrary 

and j are the Input and output of the queue A(i,j) selected in ( n ° l f^ssanly integer) speedup Sg3 and no assumption on 

* ti „L a ii mm *L eQim „ ■ synchronization between the channel clock and the switch 

step 72 (these are all queues corresponding to the same input ; , u j n i . • 

and the same output as those of A(i,j) already added to the clock < l *> no ahgnmem of matchmg phases and cell slots is 

match). If there are no more unmatched queues (step 78), the asswned )' "bitration delay of any cell is bounded by n as 

match is complete. Consequently, the arbiter sends the 35 ong as forany output j to rate assigrmient satisfies 2 i 

inputs the indices of the matched queues at step 80 and £-*>><f • .™ c , P r °° ° f P ro P e ^" I 1 * T*\? < 

returns to step 66. Otherwise, the arbiter goes to the next Appends. Note that Property 1 holds for any feas,ble rate 

4 . „ , mni „ U ; ^„ /k„ c-t^ h->\ assignment, while Property 2 has been proved only for the 

iteration of the matching process (by returning to step 72). . .. c * r « « « • 

. , , x. . . . . , t • ^ case wnen tne sum °f rates °^ a ^ queues at all inputs 

Another essential element of the invention is the choice of 4Q corresponding to M output channel is strictly less than the 

a rate-controller S_q operating on the per-output queues ^ of ^ Qutput channe] Howeverj since this sum can 

Q(i j) at each input. More specifically, rate-controller S_q is be arbitrarily close t0 the channel capaci t y , f or a n practical 

defined as a rate controller capable of guaranteeing each purposes this i imitat ion is unimportant. Although properties 

per-output queue its assigned rate r_i,j and offering an t and2 have been proven only for speedup 3, simulations 

amount of service W_i 0 (t) to each per-output queue many 45 indicate ^ ^ ^ is suffident tQ obtain thc 

interval (0,t) so as to satisfy the following condition delay boimd of n , t caQ be conjectured> therefore, that 

^j-Ei*w_um<r_i,j+E2 (eqi) ProperUes ,1 and 2 also hold for S-2. 

Since the cost of the switch is typically the higher the 

where El and E2 are the late and early work discrepancy larger the speedup, the application of the present invention 

bounds respectively, which are assumed to be constants and 50 in switches with speedup 1 ^S^3 is now explored in some 

are expressed in the units of cells. In the embodiments detail. Here, the method and apparatus of the present inven- 

described herein, E1~E2=1 cell. El and E2 should be tion can be used without modification if, instead of allowing 

interpreted as the lower and upper bounds on the discrep- arbitrary feasible rate assignment, the total bandwidth allo- 

ancy between W_ij(t) (the amount of actual service given cated to guaranteed flows is restricted to a certain portion of 

by S_q to queue Q(i,j), expressed in cells) and the ideal 55 the link bandwidth. More specifically, it can be shown that: 

service tr ij cells that same queue should have received in for arbitrary 1^S^3, with no assumption on synchroniza- 

this interval under the fluid model (recall that the rates arc tion of cell slot and phase clocks, if the rate assignment of 

assumed to be expressed in units of cells per a unit time). guaranteed flows satisfies (1) 2„i(r_ij)<S/3, 2_j(r__ij) 

Thus, any S_q satisfying the above condition at all times t §S/3 OR (2) 2_i(r__ij)^S/3, 2__j(r_i,j)<S/3, then the 

can be used to provide rate and bandwidth guarantees with 60 arbitration delay of any cell is bounded by 3(n+m)/S-l 

the values of speedup independent of the size of the switch. [Property 3]. The proof of Property 3 is also given in the 

One instantiation of a scheduler S_q which performs the Appendix. Property 3 demonstrates that even in switches 

queue scheduling operation depicted in step 46 of FIG. 2 and with small speedup or no speedup at all, as long as the sum 

satisfies the above-stated condition (eql) is now considered of rates of guaranteed flows does not exceed the ratio S/3 of 

with reference to FIG. 4. This particular rate controller, 65 the link bandwidth, each cell can be guaranteed determin- 

essentially equivalent to a rate-controlled version of WF2Q istic arbitration delay and therefore deterministic total delay 

by Bennett et. al., mentioned in the Background section, is in the switch. This delay is the larger the smaller the speedup 
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value. The restriction on the sum of the rates can be achieved 
by admission control or by other means, which are not 
discussed in this disclosure. 

There are many ways to implement various known com- 
ponents of the described embodiment. Therefore, the details 
of such implementations are largely omitted from the dis- 
closure. However, memory requirements at the arbiter are 
now discussed briefly. The arbiter needs to store timestamps 
of the cells scheduled by input schedulers S_q but not yet 
transmitted to their outputs. Memory requirements depend 
on the size of the timestamps stored in the arbiter and the 
number of such timestamps. Note first that the timestamps 
are integers, since they represent the matching phase num- 
ber. At first glance, it may appear that the timestamps can 
grow infinitely large as the execution time increases. This 
would be impractical since the size of the timestamps and, 
consequently, memory requirements would be unbounded. 
However, the fact that the arbitration delay is bounded 
means that the difference between any two timestamps is 
bounded as well. This allows for the reduction in the range 
of timestamps to a limited range. The bound on arbitration 
delay also implies that the number of the timestamps that can 
be stored in the arbiter is limited as well. For example, 
Properties 1 and 2 demonstrate that with speedup SI 3 the 
arbitration delay of any cell can be at most n. Since for each 
input i at most one cell is scheduled per channel slot, it is 
clear that there can not be more than n timestamps in all 
queues A(ij) corresponding to a single input i. It can be 
shown that this implies that no queue can contain more than 
n+1 timestamps, which implies that the total number of 
timestamps in the arbiter cannot exceed n(n+l). Therefore, 
at an additional expense of storage required for accounting 
to which queue a timestamp belongs, there is no need to 
store more than n(n+l) timestamps, since the memory can be 
dynamically allocated to a newly arrived timestamp. If the 
implementation chooses to statically allocate memory for all 
A(i,j), however, then the total amount of memory required is 
mn(n+l), because in the worst case n+1 timestamps can be 
in any of the nm queues. 

Alternatively, the arbiter need not maintain all the times- 
tamps of scheduled cells. Instead, part of the burden of 
maintaining timestamps is shifted to the input channel. This 
can be explained more clearly with reference to steps 48 of 
FIG. 2 and step 68 of FIG. 3. At step 48, the timestamps 
communicated by the input channel to the arbiter at the 
beginning of each matching phase include only those times- 
tamps corresponding to scheduled HOL cells in each queue. 
If the scheduler S_q can only schedule one cell per cell time 
(that is, the scheduler S_q operates at the speed of the input 
channel), then the input channel need only communicate at 
most two timestamps to the arbiter: one for the queue whose 
cell was transmitted to the output at the end of the previous 
matching phase (and there could only be at most one such 
cell per input), and one for the cell which has just been 
scheduled by S_q at the current channel cell time. If the 
scheduler S_q operates faster than the speed of the channel, 
then more than one cell can be scheduled per channel cell 
time. At step 68, then, the arbiter receives at most two 
timestamps per input/output pair. Consequently, since only 
one timestamp now needs to be stored per each A(ij), 
memory requirements of the arbiter are reduced, but at the 
expense of increased communications between the input 
channels and the arbiter and at the expense of additional 
storage at the input, as the input channel now needs to store 
not only the cell (or the pointer to the cell) scheduled by 
S_q, but also the time at which it was scheduled. 



As described above and shown in FIG. 1, the S_q portion 
of the arbitration mechanism is decentralized at the input 
channels. Alternatively, the arbiter can run all of the rate 
controllers S„q locally. To do so, the arbiter treats the 

5 arrivals to the per-output queues at the input channels as 
occurring exactly at their ideal times and at their ideal rates. 
Conceptually, it maintains for each input/output pair a queue 
A(i j) to which imaginary "dummy" cells arrive according to 
their ideal inter-arrival times. It then runs n copies of the 

io S„q rate controller (one per each input) to determine the 
scheduling order of these dummy cells. The operation of the 
input channel is now simplified by omitting steps 46 and 48 
of FIG. 2. While step 48 is eliminated completely, step 46 as 
performed by the arbiter for all input channels in parallel 

is (step 46') replaces steps 62, 64 and 66 in FIG. 3. The rest of 
the operation follows that of the distributed version 
described above in reference to FIG. 2 and FIG. 3. As should 
be clear from this description, the arbiter is not provided 
with queue indices for queues scheduled by each input 

20 channel at each cell time. Rather, the arbiter obtains this 
information for itself. Unlike the implementations previ- 
ously described, this approach clearly puts a higher load on 
the arbiter in terms of computation and storage, while 
reducing the amount of communication between the input 

25 channels and the arbiter. With this approach, it is possible 
that some queue Q(i j) at the input channel is matched when 
this queue is empty. In this case the input channel simply 
does not send any guaranteed cell. If best effort traffic is 
present, the opportunity to transmit the cell is passed to best 

30 effort traffic. 

In all options of the preferred embodiment as described 
thus far, there is a scheduler S_q per each input channel. 
Where the schedulers S_q are run by the input channels, the 
computational work required for the scheduling decision is 

35 distributed among input channels and therefore the compu- 
tational load is reduced. As discussed above, it is also 
possible to perform all scheduling decisions at the arbiter by 
running in parallel schedulers S__q for each of the input 
channels. In this case, the complexity of the arbiter is 

40 increased, but there is substantial saving in the amount of 
communication required between input channels and the 
arbiter. Properties 1,2 and 3 described above hold for either 
one of these two options. 

While the disclosed input-buffered switch and scheduling 

45 method has been particularly shown and described with 
reference to the preferred embodiments, it will be under- 
stood by those skilled in the art that various modifications in 
form and detail may be made therein without departing from 
the scope and spirit of the invention as set forth by the 

50 claims. Accordingly, modifications such as those suggested 
above, but not limited thereto, are to be considered within 
the scope of the claims. 

APPENDIX 

55 It is assumed that all cells are scheduled by their S_q 
scheduler at the beginning of a channel cell slot and imme- 
diately become available for the arbiter. Matching occurs at 
the right-hand boundary (i.e. at the end) of a matching phase. 
. Only cells available to the arbiter at the beginning of a 

60 matching phase participate in the matching process during 
this phase. It is also assumed that, if a phase boundary 
coincides with a slot boundary, all departures at this phase 
boundary occur prior to the scheduling decisions of S_q at 
the beginning of the next cell slot. The term "cell slot" shall 

65 be used to denote channel cell slot. 

Lemma 1. If a cell c(ij) has been scheduled before or at the 
beginning of a matching phase and remains there at the 
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end of this matchiag phase, then a cell with smaller or 
equal scheduling time went from input i or to output j. 
Proof. 

By operation of the algorithm matching occurs in the 
increasing order of scheduling times. Therefore, if no other 
cell has gone from input i or to output j in a match, by 
operation of the algorithm, this cell would have gone. This 
is a simple consequence of the input-output contention. 
Definition. Arbitration delay of a cell is the time elapsed 
between the time it is scheduled by its input scheduler 
S„q and the time it is transferred to its output. 
Property 1 . In an nxm crossbar (n inputs m outputs) running 
RSEFTF with integer speedup S>=3 and matching phases 
aligned with input cell times, arbitration delay of any cell 
never exceeds n channel cell times. More specifically, any 
cell scheduled at the beginning of cell slot t is transmitted 
before the beginning of cell slot t+n. This is true for 
arbitrary feasible rate assignment. 
Proof. 

Consider the first time t this is violated for one or more cells 
in the switch. There must be one or more cells which were 
scheduled at or before the beginning of cell slot t-n and 
which still remain at their inputs at the beginning of cell 
slot t. Let c(y) denote the cell with the earliest scheduling 
time of all such cells (if there are several cells with the 
same earliest scheduling time, we pick any of them). This 
cell must have been scheduled by its scheduler S_q at the 
beginning of cell slot t-n. This is because if it were 
scheduled later, it would not have violated its delay bound 
n by time t, and if it were scheduled earlier, its delay 
bound would have been violated before t, which would 
contradict the assumption that t was the first time a 
violation occurred. For integer speedup S with matching 
phases synchronized with cell times, there are exactly nS 
matching phases which had occurred between the begin- 
ning of cell slot t-n and the beginning of cell slot t. By 
Lemma 1 , in order for cells c(i,j) to remain at its input at 
time t, there must have been at least nS cells with smaller 
or equal timestamps than t-n (which is the timestamp of 
c(ij)) which were scheduled by the S_q scheduler at 

input i and/or by any input scheduler S q destined to 

output j. 

Clearly, no cell scheduled after time t-n will have a times- 
lamp less than or equal to t-n. This means that, in order 
for c(i j) to remain at its input at time t, at time t-n there 
must have been at least nS cells aside from c(i,j) at input 
i and/or at any input destined to output], which had been 
scheduled by the scheduler S_q at or prior to the begin- 
ning of time t-n, but not transmitted yet by that time. By 
the assumption of t-n being the earliest scheduling time 
of any cell which violated delay bound n, all cells which 
were scheduled by their S_q schedulers at or prior to time 
t-2n would have been transmitted by the beginning of cell 
slot t-n. Therefore, the only cells which can be not yet 
transmitted by time t-n are those which were scheduled in 
n cell slots t~2n+l, t-2n+2 . . . t-n, which includes c(i,j) 
itself. Since the input scheduler S__q schedules at most 
one cell per cell slot, there can be at most n-1 such cells 
at input i (not counting c(ij)). By the properties of 
RSEFTF, each queue (i'j') can be scheduled at most n 
r_i',j'+l times in any n cell slots. Therefore, there can be 
at most 2_ i(n r„jj)+n-l cells scheduled in the interval 
(t-n-(n-l), t-n) of n cells slots long to output j (not 
counting c(ij)). Therefore, the total number of cells at 
time t-n which are either at input i or destined to output 
j which could still be at their inputs at the beginning of cell 
slot t-n with time stamps less than or equal to t-n, not 
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counting c(ij) is at most 2n+2_i(n r_i,j)-l<3n. 
Therefore, for any S^3 there will not be enough cells 
with smaJler or equal timestamps than c(ij) to prevent 
c(i,j) from leaving its input before time t. QED. Property 

S 2. In an nxm crossbar running RSEFTF, for arbitrary (not 
necessarily integer) speedup S^3, with no assumption on 
synchronization between the cell slot clock and the phase 
clock (no alignment of phases in cell slots), arbitration 
delay of any cell is bounded by d-1 as long as for any 
output j 2_i(r__ij)<l. 

10 Proof. 

The proof is almost identical to that of Property 1. Consider 
the first time t this is violated for one or more cells in the 
switch. That is, at the beginning of some cell slot t, there 
is one or more cells which were scheduled at or before the 
15 beginning of cell slot t-n and wbich still remain at their 
inputs. Let c(ij) the cell with the earliest scheduling time 
of all such cells (if there are several cells with the same 
earliest scheduling time, we pick any of them). This cell 
must have been scheduled by its scheduler S_q at the 
20 beginning of cell slot t-n. This is because if it were 
scheduled later, it would not have violated its delay bound 
n by time t, and if it were scheduled earlier, its delay 
bound would have been violated before t, which would 
contradict our assumption that t was the first time a 
25 violation occurred. 

It is easy to see that for any speedup S there are at least 
nS-2 full matching phases which occur between the begin- 
ning of cell slot t-n and the beginning of cell slot t. By 
Lemma 1, in order for cell c(i,j) to remain at its input at time 
30 t, there must have been at least nS-2 cells with smaller or 
equal timestamps than c(ij) scheduled from i and/or to j. 
Clearly, no cell scheduled after time t-n will have a times- 
tamp less than or equal than t-n, which is the timestamp of 
c(i,j). This means that in order for c(ij) to remain at its input 
35 at time t, at time t-n there must have been at least nS-2 cells 
aside from c(i,j) at input i and/or destined to output] which 
had been scheduled by the scheduler S_q at or prior o the 
beginning of time t-n, but were not transmitted yet by that 
time. By the assumption of t-n be the earliest scheduling 
40 time of any cell which violated delay bound n, all cells 
which were scheduled prior to time t-2n+l would have been 
transmitted by time t-n. Therefore, the only cells which can 
be not yet transmitted by time t-n are those which were 
scheduled in n cell slots t-2n+l, t-2n+2 . . . t-n, which 
45 includes c(ij) itself. Since the input scheduler S_q sched- 
ules at most one cell per cell slot, there can be at most n-1 
such cells (not counting c(i,j)) at input i. By the properties 
of RSEFTF, each queue (i',j') can be scheduled at most n 
r__i',j'+l times in any n cell slots. Therefore, there can be 
50 strictly less than S_J(n r_i,j)+n-l cells scheduled in the 
interval (t-2n+l, t-n) of n cells slots to output j (not 
counting c(i,j)). This is because by the statement of the 
theorem 2_i(r_i,j)<l. Therefore, the total number of cells 
at time t-n which are either at input i or destinedto output j 
55 which could still be at their inputs at the beginning of cell 
slot t-n with time stamps less than or equal to t-n, not 
counting c(ij), is strictly less 2n+2_i(n r„ij)-l<3n. 
Therefore, for any S^3 there will not be enough cells with 
smaller or equal timestamps than c(ij) to prevent c(ij) from 
6Q leaving its input before time t. QED. 

Property 3. For arbitrary 1 ^S<3, with no assumption on 
synchronization of cell slot and phase clocks, if the rate 
assignment of guaranteed flows satisfies 
(1) Z_i(r„ij)^S/3, 2U(r_Jj)<S/3 OR 
65 (2) 2_i(r_J,j)SS/3, 2_j(r_ij)<S/3, 

then the arbitration delay of any cell is bounded by 3(n+ 
m)/S-l 
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Proof. 

Consider the first time t this is violated for one or more cells 
in the switch. That is, at the beginning of some cell slot 
t there is one or more cells which were scheduled at or 
before the beginning of cell slot t-3(n+m)/S and which 
still remain at their inputs. Let c(i,j) be the cell with the 
earliest scheduling time of all such cells (if there are 
several cells with the same earliest scheduling time, we 
pick any of them). This cell must have been scheduled by 
its scheduler S_q at the beginning of cell slot t-3(n+m)/S. 
This is because if it were scheduled later, it would not 
have violated its delay bound by time t, and if it were 
scheduled earlier, its delay bound would have been vio- 
lated before t, which would contradict the assumption that 
t was the first time a violation occurred. It is easy to see 
that there are at least S 3(n+m)/S-2o3(n+m)-2 full 
matching phases which occur between the beginning of 
cell slot t-3(n+m)/S and the beginning of cell slot t. By 
Lemma 1 in order for cell c(i,j) to remain at its input at 
time t, there must have been at least 3(n+m)-2 cells with 
smaller or equal timestamps than c(ij) scheduled from i 
and/or to j. Clearly, no cell scheduled after time t-3(n+ 
m)/S will have a timestamp less than or equal than 
t-3(n+m)/S, which is the timestamp of c(i j). This means 
that in order for c(ij) to remain at its input at time t, at 
time t-3(n+m)/S there must have been at least 3(n+m)-2 
cells aside from c(i,j) at input i and/or destined to output 
j which had been scheduled by the scheduler S_q at or 
prior to the beginning of time t-3(n+m)/S, but were not 
transmitted yet by that time. By our assumption of t-3 
(n+m)/S being the earliest scheduling time of any cell 
which violated delay bound 3(n+m)-l, all cells which 
were scheduled prior to time t-3(n+m)/S-(3(n+m)/S-l) 
would have been transmitted by time t-3(n+m)/S. 
Therefore, the only cells which can be not yet transmitted 
by time t-3(n+m)/S are those which were scheduled in 
3(n+m)/S cell slots t-3(n+m)/S-(3(n+m)/S-l), t-3(n+m)/ 
S-(3(n+m)/S-2) . . . t-3(n+m)/S, which includes c(ij) 
itself. By the properties of RSEFTF, each queue (i 1 j') can 
be scheduled at most nr_i'j'+l times in any tau cell slots. 
Therefore, there can be at most 2_i(3(n+m)/Sr_ij) +n-l 
cells scheduled in the interval t-3(n+m)/S-(3(n+m)/S-l), 
t-3(n+m)/S) of 3(n+m)/S cells slots to output j (not 
counting c(i,j)). Similarly, for rates satisfying condition 
(1) there can be at most Z_j(3(n+m)/Sr_i,j)+m-l cells at 
the input i which were scheduled in this interval. 
Therefore, the total number of cells at time t-n which are 
either at input i or destined to output j which could still be 
at their inputs at the beginning of cell slot t-3(n+m-2)/S 
with time stamps less than or equal to t-3(n+m)/S, not 
counting c(i,j) is at most Z J(3(n+m)/S^_ij)+2_i(3(n+ 
m)/Sr_i,j)+m+n-2<2x3(n+m)><S/3+m+n-2=3(m+^^ 
falling short of the required number of 3(m+n)-2 needed 
to prevent c(i j) from leaving by time t. Therefore, for any 
1^S^3 there will not be enough cells with smaller or 
equal timestamps than c(ij) to prevent c(i j) from leaving 
its input before time t. QED. 
What is claimed is: 

1. A method of providing bandwidth and delay guarantees 
in an input-buffered switch with a speed-up S having input 
channels and output channels for transferring cells 
therebetween, comprising the steps of: 
providing to each of the input channels per-output queues 
for buffering cells awaiting transfer to the output 
channels, each queue being associated with a respective 
input channel and output channel, each queue having 
an assigned rate and an ideal service associated there- 
with; 
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providing to each of the input channels a flow-level 
scheduler capable of providing bandwidth and delay 
guarantees to schedule cells awaiting transfer at the 
input channels through flow queues and assign the cells 
5 to ones of the per-output-channel queues; 

providing to each input channel a rate controller for 
scheduling at a given cell slot the per-output-channel 
queues in the input channel to which it is provided, the 
rate controller being capable of guaranteeing to each 
10 queue an amount of actual service that is within fixed 
bounds from the ideal service of the queue, the fixed 
bounds each being equal to a value of one; 

initializing state variables corresponding to the rate con- 
troller; 

3S initializing a channel clock counter value to zero; 

running each rate controller to select one of the queues as 

scheduled at the given cell slot; 
associating with indices corresponding to the selected one 
of the queues timestamps equal to the current time; and 

20 for each matching phase where S is greater than or equal 
to two, performing arbitration processing to control 
transfer of the queued cells through the switch from the 
input channels to the output channels the step of 
performing arbitration processing including the steps 
of: 

25 performing a maximal match computation using the 
associated timestamps; and 
indicating to each input channel the scheduled queues 

from which the input channel may transfer a cell; 
advancing by one the channel clock counter value; 
30 determining if the per-output queue rates have changed; 
and 

returning to the step of running each rate controller. 
2. A method of providing bandwidth and delay guarantees 
in an input-buffered switch with a speed-up S according to 
35 claim 1, wherein the maximal match computation comprises 
the steps of: providing a set_match set and a set queues set, 
the set_match set being initialized to an empty set and the 
set__queue set to the set of the associated timestamps; 
selecting the smallest of the associated timestamps stored 
40 in the set_queues set; 

adding the selected associated timestamp to the set_ 
match set and removing the selecting associated times- 
tamp from the set_queues set; 
45 deleting from the set_queues set all remaining associated 
timestamps associated with per-output queues corre- 
sponding to either the same input channel or output 
channel as the selected associated timestamp; 
if the set queues set is empty, sending the indices of the 

50 queues corresponding to the timestamps in the set 

match set to the input channels to which they belong; 
and 

if the set_queues set is not empty, then returning to step 
of selecting. 

55 3. A method of providing bandwidth and delay guarantees 
in an input-buffered switch with a speed-up S according to 
claim 1, wherein the step of running the rate controller 
comprises the steps of: 

for each per-output queue, maintaining the state variables 
$o to include a first and a second the state variable, the first 
state variable corresponding to an ideal beginning time 
of the next cell of the per-output queue and the second 
state variable corresponding to an ideal finishing time 
of transmission of the next cell of the per-output queue; 
65 selecting as eligible all per-output queues having an ideal 
beginning time which is less than or equal to the current 
channel clock counter value; 
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selecting as scheduled the eligible queue having the 

smallest finish time; and 
for the selected eligible queue, updating the first state 

variable with the ideal finish time and second state 

variable with the ideal beginning time plus one divided 

by the assigned rate. 

4. A method of providing bandwidth and delay guarantees 
in an input-buffered switch with a speed-up S having input 
channels and output channels for transferring cells 
therebetween, comprising the steps of: 

providing to each of the input channels per-output queues 
for buffering cells awaiting transfer to the output 
channels, each queue being associated with a respective 
input channel and output channel, each queue having 
an assigned rate limited to 50% of port bandwidth and 
an ideal service associated therewith; 

providing to each of the input channels a flow-level 
scheduler capable of providing bandwidth and delay 
guarantees to schedule cells awaiting transfer at the 
input channels through flow queues and assign the cells 
to ones of the per-output-channel queues; 

providing to each input channel a rate controller for 
scheduling at a givea cell slot the per-output-cbannel 
queues in the input channel to which it is provided, the 
rate controller being capable of guaranteeing to each 
queue an amount of actual service that is within fixed 
bounds from the ideal service of the queue, the fixed 
bounds each being equal to a value of one; 

initializing state variables corresponding to the rate con- 
troller; 

initializing a channel clock counter value to zero; 

running each rate controller to select one of the queues as 
scheduled at the given cell slot; 

associating with indices corresponding to the selected one 
of the queues timestamps equal to the current time; and 

for each matching phase where S is greater than or equal 
to one but less than two, performing arbitration pro- 
cessing to coatrol transfer of the queued cells through 
the switch from the input channels to the output 
channels, the step of performing arbitration processing 
including the steps of: 

performing a maximal match computation using the 

associated timestamps; and 
indicating to each input channel the scheduled queues 

from which the input channel may transfer a cell; 
advancing by one the channel clock counter value; 
determining if the per-output queue rates have changed; 

and 

returning to the step of running each rate controller. 

5. A method of providing bandwidth and delay guarantees 
in an input-buffered switch with a speed-up S according to 
claim 4, wherein the maximal match computation comprises 
the steps of: 

providing a set match set and a set_queues set, the 

set_match set being initialized to an empty set and the 
set__queue set to the set of the associated timestamps; 

selecting the smallest of the associated timestamps stored 
in the set_queues set; 

adding the selected associated timestamp to the set_ 
match set and removing the selecting associated times- 
tamp from the set_queues set; 

deleting from the set_queues set all remaining associated 
timestamps associated with per-output queues corre- 
sponding to either the same input channel or output 
channel as the selected associated timestamp; 

if the set queues set is empty, sending the indices of the 
queues corresponding to the timestamps in the set__ 
match set to the input channels to which they belong; 
and 
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if the set_queues set is not empty, then returning to step 
of selecting. 

6. Amethod of providing bandwidth and delay guarantees 
in an input-buffered switch with a speed-up S according to 

5 claim 4, wherein the step of running the rate controller 
comprises the steps of: 

for each per-output queue, maintaining the state variables 
to include a first and a second the state variable, the first 
state variable corresponding to an ideal beginning time 
io of the next cell of the per-output queue and the second 
state variable corresponding to an ideal finishing time 
of transmission of the next cell of the per-output queue; 
selecting as eligible all per-output queues having an ideal 
beginning time which is less than or equal to the current 
15 channel clock counter value; 

selecting as scheduled the eligible queue having the 

smallest finish time; and 
for the selected eligible queue, updating the first state 
20 variable with the ideal finish time and second state 
variable with the ideal beginning time plus one divided 
by the assigned rate. 

7. A apparatus for providing bandwidth and delay guar- 
antees in an input-buffered switch with a speed-up S, the 

25 input-buffered switch having input channels and output 
channels for transferring cells therebetween, the apparatus 
comprising: 

per-output-channel queues in each of the input channels 
for buffering cells awaiting transfer to the output 

30 channels, each per-output-channel queue correspond- 
ing to a respective input channel and output channel, 
and having an assigned rate and an ideal service 
associated therewith; 
a flow-level scheduler in each of the input channels 

35 capable of providing bandwidth and delay guarantees 
for scheduling cells awaiting transfer at the input 
channels through flow queues and assigning the cells to 
ones of the per-output-channel queues; 
a rate controller corresponding to each input channel for 

40 scheduling for a given cell slot the per-output-channel 
queues in the input channel to which it corresponds, the 
rate controller being defined to guarantee to each queue 
an amount of actual service that is within fixed bounds 
from the ideal service of the queue, the fixed bounds 

45 being equal to a value of one; and 

an arbiter, responsive to the scheduling of queues by each 
rate controller, for controlling the processing of the 
queued cells through the switch from the input channels 
to the output channels at a speedup S equal to a number 

50 of phases per cell slot, where S is greater than or equal 
to two, the arbiter using a maximal match computation 
to choose one of the scheduled queues from which a 
cell may be transmitted in each phase. 

8. A apparatus for providing bandwidth and delay guar- 
antees in an input-buffered switch with speed-up according 

55 to claim 7, wherein the rate controller is located in the 
arbiter. 

9. An apparatus for providing bandwidth and delay guar- 
antees in an input-buffered switch with speed-up according 
to claim 7, wherein the rate controller is located in the input 

60 channel to which it corresponds. 

10. An apparatus for providing bandwidth and delay 
guarantees in an input-buffered switch with speed-up 
according to claim 7, where S is greater than or equal to one 
but less than two with a load due to guaranteed flows limited 

65 to 50% of port bandwidth. 

***** 
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